Data GroundTruth, Complexity, and Evaluation Measures for Color Document Analysis

نویسندگان

  • Leon Todoran
  • Marcel Worring
  • Arnold W. M. Smeulders
چکیده

Publications on color document image analysis present results on small, non-publicly available datasets. We propose in this paper a well defined and groundtruthed color dataset existing of over 1000 pages, with associated tools for evaluation. The color data groundtruthing and evaluation tools are based on a well defined document model, complexity measures to assess the inherent difficulty of analyzing a page, and well founded evaluation measures. Together they form a suitable basis for evaluating diverse applications in color document analysis.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Title of Thesis : GROUNDTRUTH GENERATION AND DOCUMENT IMAGE DEGRADATION

Title of Thesis: GROUNDTRUTH GENERATION AND DOCUMENT IMAGE DEGRADATION Gang Zi, Master of Science, 2005 Thesis Directed By: Professor Rama Chellappa Department of Electrical and Computer Engineering University of Maryland at College Park The problem of generating synthetic data for the training and evaluation of document analysis systems has been widely addressed in recent years. With the incre...

متن کامل

An Automatic Closed-Loop Methodology for Generating Character Groundtruth for Scanned Documents

Character groundtruth for real, scanned document images is crucial for evaluating the performance of OCR systems, training OCR algorithms, and validating document degradation models. Unfortunately, manual collection of accurate groundtruth for characters in a real (scanned) document image is not practical because (i) accuracy in delineating groundtruth character bounding boxes is not high enoug...

متن کامل

Automatic Generation of Character Groundtruth for Scanned Documents: A Closed-Loop Approach - Pattern Recognition, 1996., Proceedings of the 13th International Conference on

Character groundtruth for scanned document images as crucial for evaluating the performance of OCR systems, training OCR algorithms, and validating document degradation models. Unfortunately, manual collection of accurate groundtruth for characters in a real (scanned) document image is not possible because (a) accuracy an delineating groundtruth character bounding boxes is not high enough, (ii)...

متن کامل

Automatic generation of character groundtruth for scanned documents: a closed-loop approach

Character groundtruth for scanned document images is crucial for evaluating the performance of OCR systems, training OCR algorithms, and validating document degradation models. Unfortunately, manual collection of accurate groundtruth for characters in a real (scanned) document image is not possible because (i) accuracy in delineating groundtruth character bounding boxes is not high enough, (ii)...

متن کامل

An Automatic Closed-loop Methodology for Generating Character Groundtruth for Scanned Documents an Automatic Closed-loop Methodology for Generating Character Groundtruth for Scanned Documents an Automatic Closed-loop Methodology for Generating Character Groundtruth for Scanned Documents

Character groundtruth for real, scanned document images is crucial for evaluating the performance of OCR systems, training OCR algorithms, and validating document degradation models. Unfortunately, manual collection of accurate groundtruth for characters in a real (scanned) document image is not practical because (i) accuracy in delineating groundtruth character bounding boxes is not high enoug...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002